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ABSTRACT 

The implications of standardized testing for minority 
jstudent? are explored. Test terminology is described in terms of 
objectivity, standardization, reliability, and validity. Primarily, 
however, the paper reviews the objectivity of standardized testing, 
that is, of those tests which are either norm-referenced or 
criterion-referenced. The use of these tests is illustrated. Reasons 
for the decline in satisfactory test results is cited, and the 
factors reported by the National Academy of Education Committee on 
Testing and Basic Skills are emphasized: (H proliferation of 
courses: (2) confusion about the appropriate role of teachers; <3) 
slackening of "on task«» attention: and (H) dismantling of 
opportunities for intensive study in selective academic environments 
at the secondary level. Other researchers are broadly cited 
pertaining to a critical overview of standardised testing and to some 
alternatives to standardized testing, (GK) 
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What do you do when you find out you have to take a test? Some us us itmnexaately 
begin to perspire, others catch their breath, some of us want to run, some get charged 
up, and some remain cool. The thought of taking a test makes us think about how 
knowledgeable we are about the subject of the test. And for most of us the results 
of the test are clear: We were either right or wrong; it was our fault that we 
didn't answer a question correctly. 

If you think that, you're partly right and partly wrong. It is the intention 
of this paper to explore standardized tests and their implications for minority 
students, hoping that when you take your next test (or give your next test) you will 
not so readily accept the test results or their interpretation. 

Terminology 

A test can be objective, proper, standardized, reliable, and valid — and still 
be a very bad test.^ The four main words that resound within in the field of 
testing are objectivity , standardization , reliability , and validity . This paper will 
review pr-lmarily the objectivity of standardized testing, but defining all four 
terms will establish a common frame of reference. 

1, Objectivity means that everyone takes a test under more or less the same 
conditions and all the tests are graded under more or less the same conditions. 

2. Standardization means the making of arrangements for all students to take 
the test under similar conditions, for example, che same available time. It also 
means the establishment of norms for performance so that test scores come out as 
percentiles. Standardization has no bearing on the quality of the test; it affects 
only the reporting of scores. Further, standardization applies to the sample 



population to whom the test taken is compared. The test maker has established 
expected standards of performance determined by administration of the test to a 
selected group of students, for example, by age or grade. 

3. Reliability means how well a test agrees with itself. The higher the 
reliability the better the test. The ratings of the test arc generally through the 
reliability coefficient. The closer to 1.0, the more reliable the test, i.e., .95 
la excellent, .90 is pretty good, and .80 is not so good. 

4. Validity means the degree to which a test measures what it is supposed to 
measure. Validity is extremely difficult to ascertain. It is determined by an 
expert or by comparing the test results with some other measure. If a test seems 
to an expert to measure what it purports to measure, then it has content or face 
validity . For example, it would be relatively easy to establish content or face 
validity of a math test designed to measure students' abilities to add one digit 
nusibers below 10. 

Comparison Validity is determined by comparing the test scores with a second measure 
of con^etence. If students score high on a test in math and receive high grades in 
the course, then there is a possibility that the test has a high degree of validity. 
The only catch Is that competence in the math course should be compared to a measure 
other than the test. 

These four terms — objectivity, standardization, reliability, and validity — 
should be examined each time a test is approached or presented. 

Standardized tests may be of two kinds: norm-referenced or criterion-referenced. 

Norm-referenced tests are used to determine how the performance of a given student 

or group of students compares with the performance of a group of students whose 

scores are given as the norm. Criterion-referenced tests are used to determine 

whether a given student has reached a particular level of performance; they do not 

2 

compare the student with other students. 
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There has been a lot of controversy in recent years about standardized testing. 

whose history goes back to the success of the Army Alpha test in World War I, This 

paper- and-pencil test was designed to select people who would make "good" soldiers, 

and it also effectively screened out Blacks based upon IQ. The schools and colleges 

quickly adopted psychological tests to select students, and by 1929, more than 5 

million tests were being administered annually. In 1975 the National Education 

Association reported that at least 200 million achievement test forms were being 

used each year in the United States, and this number represents only 65 percent of 

3 

all educational and psychological tests that are administered. Consequently, tests 
are used to make individual or institutional decisions, diagnosis or prognosis, 
research or evaluation. 

A case in point: In 1912 Henry Goddard, one of the original translators, 
imported and translated the Binet Test. After testing a representative sample of 
new immigrants in 1912, he reported that 83 percent of Jews, 79 percent of Russians 
and 87 percent of Italians were feebleminded. It was pointed out to Congress again 
and again that, so far as IQ was concerned, immigrants from Southeastern Europe 
were genetically inferior to Nordic immigrants from Northwestern Europe. And, when 
Congress passed an immigration law in 192A, it embodied for the first time what was 
called "national origin quotas. 

The results of standardized tests indicate that students today are not per- 
forming as well on achievement tests as students of five years ago. The Wirtz 
Report (1977)5 for the Educational Testing Service (ETS) cites forces outside the 
school that have caused the decline of writing skills and Scholastic Aptitude Test 
scores: (a) television, (b) divorce rates, (c) lower student motivation, (d) a 
decade of distraction, (e) increased absenteeism, and (f) the increasing number of 
persons taking the SAT. The National Academy of Education Committee on Testing and 
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Basic Skills (1977)^ on the other hand reported that factors within school have been 
particularly important in causing the decline in writing skills and SAT scores. 
These are: 

a. Proliferation of courses . With an increased number of options, students have 
been able to choose less demanding courses than the traditional requirements 
of English, mathematics, science, and social studies (especially history). 

b. Confusion about the appropriate role of teachers . Teachers have been given 
new and often contradictory models of appropriate pedagogic behavior. But 
"the discovery method," "the open classroom," "individualized instruction," 
or "team teaching" were rarely accompanied by teacher training and profes- 
sional development programs adequate for effective implementation. 

c. The slackening of on task" attention . The Academy reports some studies 
showing that even in relatively "good" classrooms students are "on task" 
for only 30 percent of the instructional day. It believes that more 
effective use of school time would be a significant reform. 

d. The dismantling of opportunities for intensive study in selective academic 
environments at the secondary level . For example, programs for the 
academically talented have been closed or forced to change admission stand- 
ards. 

A reaction to the decline in test scores is the setting of minimum competency 

standards for elementary and secondary students. As of March 15, 1978, 13 states 

had taken some type of action to mandate minimum competencies. In the remaining 

states either legislation is pending or legislative or state board studies are under 
7 

way. 

For Blacks and other minorities standardized tests arc being used to determine 
whether intelligence is fixed at birth or depends upon environmental factors, as 
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set forth In the works of Arthur Jensen and William Shockley. In a 1972 study, 

Psychologist Jane Mercer found that from 50 to 300 percent more Blacks and Mexican 

Americans were identified as mentally retarded than could reasonably be expected 

8 

from their proportion of the population. 

The National Teacher Examination, designed by the Educational Testing Service, 
is used in South Carolina to assess teachers' performance and production. This 
is a misuse of the test scores according to ETS. yet the policy is still In effect 

and has successfully trimmed the number of Black teachers— from 43 percent of the 
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teaching force in 1953 to 29 percent in 1975. 

North Carolina uses the National Teacher Examination with a cutoff score of 950 
to determine the salary, retention, and tenure of teachers with substantial in-service 
experience and the certification of prospective teachers with no experience. This 
minimum score disqualifies proportionately more Black persons than White. 

Blacks who wanted to become police officers in Washington, D.C., were given 
an application and a written test of verbal skills to determine their ability to 
be police officers. Two who were rejected took the case to court. The case was 
lost in the lower court but won on appeal. The appeal stated that since four times 
as many Blacks as Whites failed the test, such disproportionate impact sufficed to 
establish a constitutional violation unless the employer could demonstrate that the 
skills measured by the test were substantially related to job performance. The 

Supreme Court upheld that decision. 

Think back and note the many times a test was used as the role criterion to 
determine whether you were a success or a failure. Also note the times a test was 
used as the first of many criteria and if you failed you were not allowed to deal 
with the next criterion. Yes, tests play an important role throughout our lives. 
Our parents and peers rarely listen to our arguments regarding the fairness of the 
test; what is generally focused upon is the end result— your score. 
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Several prominent national organizations are so concerned about the misuse and 

abuse of standardized testing that they have called for a moratorium on the practice — 

The National Association for the Advancement of Colored People, The National Education 

Association, The National Association of Elementary School Principals and The American 

Personnel and Guidance Association. Their call for a moratorium to delay testing 

in reality was asking a 300 million dollars a year industry to take a vacation without 

pay. (The Bank of New York's Research Division reports that gross earnings for 

testing for the three major test publishers and one scoring company totaled $105 million 

in 1974. These companies are Houghton Mifflin, which publishes the Iowa B' 3ic Skills; 

Harcourt Brace Jovanovich, which publishes Stanford Achievement and Metropolitan 

12 

Achievement Tests; and the Westinghouse Measurement Research Center. I" 1961, the 
last year in which IBM-owned Science Research Associates filed an independent report, 
net sales totaled $9 million. 

Testing is instricately woven into the fabric of the United States. Fred 
Hechinger in the New York Times of May 1, 1977-'-^ wrote: "Americans are a nation of 
score keepers. They want to know just exactly how they are doing in everything from 
sexual performance to their children's third-year experience. There is constant 
pressure by parents, school administrators, state education authorities. Congress 
and colleges to compare the performance of each child and each district with 'the 
rDrm. ' Standardized tests are exp<2cted to do this job." 

Since the majority of our c"..i.ldren must go to school until the age of 16, the 
school is the next most important socializing force in our society, after the home. 
And since our education.- 1 philosophy is based mainly on the Anglo-Saxon ideal, 
standardized testing has important implications for minority and low-income students, 
particularly their objectivity of the tests, as oointed out in some of the literature. 
Flaugher (1970)^^ submits that there is sketchy but provocative evidence to indicate 
that the atmosphere, both physical and psychological, in which an examination is 
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coiq>leted can influence the quality of performance. 

Sattler (1969)^^ reports that White examiners affect and at times impede the 
performance of Black children. Savage and Bowe^ (1972)^^, using subtep.s of the 
Wechsier Intelligence Scale for children, found that both Black and White children 
performed better Xpn.th tescers of their own race on block design but not on digit 
span. Katz (1970;^^ found that the race~of-thc examiner effect on Blacks is depend- 
ent upon the type of task as well as its complexity and suggests that variables 
such as speed and interaction between the examiner and examinee could heighten the 
sensitivity during the test. Hawkes and Koss (1970)^^ report that inner-city 

students exhibit high levels of general and test-specific anxiety. In the Savage 
20 

and Bowers study, the race-of- tester effect was mediated by the racial make-up 
of the school — monoracial White, monoracial Black, multiracial — and the age of the 
children. In the first grade, both Black and White children performed better with 
some race testers. Across school types however, the effect was stronger in the 
ail Black school. There was less tester difference in third and fifth grades, 
indicating that the older children had less reaction to the White teacher. Black 
first-and third-graders in multiracial schools scored consistently higher under 
Black testers on both tasks — digit span and block design. Only at grade five was 
the race-of-tester effect eliminated. The authors conclude that "contrary to 
popular conception, interracial contact appears to have a negative effect when applied 
to a biracial test situation." Studies by Williams (1969)^^, Roper (1972)^2, and 
Cohen and Roper (1972) reported that the school environment depresses the 
performance of Blacks. Katz, Atchision, and Epps and Perry^^ further report that 
the type of feedback given to Black college students in a biracial test situation 
will affect their performance. 

Watson (1972)25, who replicated the works of K'.cz in Great Britain with West 
Indian teenagers, found that Plack children who had a White tester and who believed 
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they Te doing an IQ test did less well than Black children who had a Black tester 
and wUo thought the test was a research tool* Upon repeating the experiment two 
years later with 14-and 15-~year*olds, the White tester produced a significant drop 
in performance. However, with a White tester who gave no instruction and a Black 
tester who did give instructions, a small rise in performance was noted. With 
younger children, performance was worse with a UTiite tester in all instruction 
conditions. 

Oft 

Thomas et al.(1971) reported that Puerto Rican students teste<i ^^nd retested 
by experienced female Puerto Rican bilingual examiners produced significantly higher 
scores with examiners who made them feel relaxed and comfortable than with more 
formal examiners who carried out the test according to instructions. Five percent 
of the children who were tested by the less formal examiner tested in the borderline 
defective range, compared with A5 percent examined by the formal examiner. 

Integrated Education (1972)^^ reported that the U.S. District Court temporarily 
enjoined a San Francisco school district from placing Black children in classrooms 
for the mentally retarded based upon IQ scorec because Blacks were being so placed 
at a rate nearly 2-1/2 times their school population. The Bay Area Association of 
Black Psychologists, using optimizing techniques, reexamined the seven Black '^hildren 
who were the plaintiffs. All of them scored significantly above the cutoff point of 
75. The optimizing techniques were increasing rapport, overcoming defeatist 

attitudes and distractability of the children. 

28 

Parettl (1975) examined sex and race with 268 Black school children in 24 
different fifth-and sixth-grade classrooms. He reported that on the average 
subjects performed significantly lower with females than with males. 

Katz-Zalk (1976) report the differences in response patterns were found as a 
function of age, race of examiner, race of subject and in some instances, gender. 

Nober and Seymour (197A)^^ noted that White student teacher speech recognition 
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was significantly lower for Black speakers than for White speakers. While Black 
listeners scoring White speakers equalled Black listeners scoring Black speakers. 

Green et al. (1975) -^"^ report that naive test takers had significant score 
Improvements when no time limit was imposed. They furr.her report that middle-class 
children are more sophisticated in taking tests than educationally "underclassed" 
children, and that most minority and poor children,|tend to be less motivated than 
middle-class children. Both factors can affect test performance irrespectivf, of 
ability or knowledge. 

Green et al. cite two basic dimensions of test-taking ability: (1) general 
know-how, which encompasses such strategies as how to pace oneself, how to avoid 
unnecessary errors, knowing when ro guess, and how to choose the correct answer by 
eliminating Incorrect items; an:*. (2) "test-wiaeness," which Is the ability to take 
advantage of irrelevant clues in test Items to help answer questions without 
necessarily knowing the content. 

Finally, the computerized scanner capable of scoring up to forty thousand 
tests an hour provides a quantitative product for each test taken. This highly 
sensitive machine assures high volume scoring and continuity of the conC'.;pt of 
large-scale group testing. What it does not do is allow for the test taker's 
knowledge; an answer is either wrong or right. The scanner doesn't know if the 
question was ambiguous or misread, or if the test taker just forgot the answer. 
It doesn't know how the person was feeling when taking the test. However, the 
scoring of the test was objective. 

What happens with some of the test results: 

1. The majority of the students are grouped by ability. (Findley and Byron, 
1970, I.)^^ 

2, Low achievers are grouped together and deprived of the stimulation of high 
achieving children as learning models and helpers. (Findley and Byron, 1970, IV,) 
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3. Principals and school superintendents are regularly judged by and rewarded 
for their pupils* performance* 

A. Students are admitted to or rejected by a course, college, or professional 
school on the basis of test results. 

5» Test results determine the curriculum and how it will be taught in the class. 

6. Scores are permanently attached to student records. 

7. The results maintain the differential status of population subgroups. 
(Mercer, 1974.)^^ 

8. Test results legitimize the dominance of the Anglo cultural tradition.'^-' 

36 

9. They tend to calm parents who are seeking to change educational programs. 
10. Cities, school districts, and schools are compared and allocated resources 

on the basis of test results. 

To expand on this point, the National Institute of Education has developed a 
proposal to allocate Title I funds to school districts on the basis of reading scores 
of 9--year-olds every three years rather than on family poverty levels. The proposal 
calls for $7.2 million per state^^ over a three-^year period to establish a system 
to collect achievement data. 

Alternatives 

What are some of the alternatives to standardized tests? 

Perrone (1975)^^ suggests interview techniques as adopted by the University of 
North Dakota and Prospect School in Bennington, Vermont. This technique requires 
systematic documentation of information about a child from the child's teacher and 
parents. Perrone further suggests the increased use of diagnostic and criterion- 
referenced tests. 

Mercer (1974)^^ presents a multicult tral model which requires at least the 
following changes: 

1. Assuming that there are as many normal curves for behavior as there are 



distinct life styles a multicultural classification system would be based on multiple 
normal curves and would not evaluate all human behavior with a single statistical 
distribution* 

2. Since persons from different cultural backgrounds are, in fact, from statis- 
tically different populations, they would not be combined in a single aggregate for 
the purpose of establishing norms. The behavior of persons from one cultural 
tradition would not be evaluated against the behavioral norms based on persons from 

a different cultural heritage. 

3. Multiple measures of validity would be used. The validity of a measurement 
techn que under the Anglo-conformity model is determined by its ability to predict 
academic success in an Anglo-American public school system. Determining the validity 
of a measurement technique in predicting success in a Mexican American, Puerto Rican, 
Native American, or Black cultural setting would necessarily require different 
measurement criteria, 

4* A multicultural perspective would require clear differentiation between 
prognosis and diagnosis, a distinction that is frequently overlooked in present 
monocultural evaluation. 

It would appear that standardized tests have limits just as a:iy tests have. 

Some of these limits are: 

a. Such a test measures what a person knows, not how he/she uses what is known. 

b. The test measures what a pupil can do, not what he/she typically does. 

c. The test provides multiple choice and does not necessarily reflect a 
person's thinking. 

d. The score is just for a moment in time and not one that is fixed. 

The score measures only one dimension of the person* s life and neglects 
aspects such as energy, ambition, determination, adroitness, likeableness , 
luck, perseverance, friends, resources, and politics. 
f. The test is monocultural, not multicultural. 



Waat should we do about standardized tests? 

1. Find out what types of tests are being administered by the schcox districts. 

2. Use services of the Association of Black Psychologists, NAACP, Black 
Institutions of higher learning to ascertain the validity, reliability and 
standards of the test for minority students. 

3. Re.earch and establish how the results of the test are Dcing used» e,g., 
ability grouping, special classes, special curriculum, specii'l programs, 
tracking. 

4. Examine the racial composition of the special classes, special programs, 
and level of tracks. 

5. Identify the ranking of minorities on tests given. Meet with school board, 
school officials, local education association, to develop alternatives to 
standardized tests. 

6. Share information with minority community and concerned citizens, e.g., 
ministry, colleges, parents, organizations and agencies. 

7. Encourage associations such as, Black Psychologists to develope a manual 
that will identify the different standardized tests and their negative 
Impact on minorities. 

8. Encourage the National Institute of Education to provide funding for 
research and training of educators and community people to develope alter- 
natives to standardized testing. 

9. Publicize the misuse and abuse of standardized tests. 

Educators and concerned citizens should remember the maxims of Thomas (1977)^^: 
(1) know thy test, what is being tested, how to administer the test, what reliance 
to place on the results; (2) know thy student, understand the population and its 
socloculture milieu; and (3) "know thyself"— personal attitudes and the program 
and practices under one's control. Additionally, educators and concerned citizens 
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should remember these words oi Jeirold R. Zacharias, a profcsor at MIT^-^: "I feel 
emotionally toward the testing industry as I would toward any other merchants of 
death. I feel that way because of what they do to the kids. I*m not saying they 
murder every child — only 20 percent of them." 
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